All Questions
8 questions
5votes
1answer
903views
Speed up strlen using SWAR in x86-64 assembly
The asm function strlen receives the link to a string as a char - Array. To do so, the function may use SWAR on general purpose register, but without using ...
3votes
1answer
1kviews
Sum two vectors in x86 assembly
I recently made a program with C++ and ASM. Can anyone help me make this code a more efficient one, in the ASM part or both. I would really appreciate it because I don't know every ASM instruction and ...
4votes
0answers
941views
Binary to hex in ARM64 SIMD assembly
As an exercise in learning ARM64 assembly (aka AArch64), I wrote this function to convert 64-bit binary to hexadecimal with SIMD instructions. I'm most interested in feedback on the algorithm, ...
5votes
2answers
506views
SSE Assembly vs GCC Compiler - Dot Product
I am currently taking an introductory course in computer architecture. Our goal was to write a dot-product function in x86 Assembly which would use SSE and SIMD (without AVX). I am not to that ...
7votes
1answer
2kviews
SIMD memcpy assembler implementation
I am fairly rusty with assembler, let alone the AT&T syntax. I would appreciate it if someone with more experience could please review the following memcpy implementation. Note that this will only ...
2votes
0answers
99views
Write 16x16 bitmap to frame buffer
The following code writes a 16x16 bitmap to a framebuffer using up to AVX2 instuctions. I'm sure it can be improved with AVX512 ...
1vote
1answer
379views
HPC kernel for DGEMM: compiler v.s. assembly
This is a correct version, for computing a small matrix multiplication: C += A * B, where C is ...
7votes
1answer
535views
Writing SIMD libraries for C++ on FASM in x86-64 Linux
I have recently started a project of SIMD libraries development for C++ on FASM for x86-64 Linux. I would be glad to hear any opinion or feedback about the project, cleanness of the code and ...